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Abstract 



Imaging in clinical oncology trials provides a wealth of information that contributes to the drug devel- 
opment process, especially in early phase studies. This paper focuses on kinetic modeling in DCE-MRI, 
inspired by mixed-effects models that are frequently used in the analysis of clinical trials. Instead of 
summarizing each scanning session as a single kinetic parameter - such as median '^ans across all voxels 
in the tumor ROI - we propose to analyze all voxel time courses from all scans and across all subjects 
simultaneously in a single model. The kinetic parameters from the usual non-linear regression model are 
decomposed into unique components associated with factors from the longitudinal study; e.g., treatment, 
patient and voxel effects. A Bayesian hierarchical model provides the framework in order to construct a 
data model, a parameter model, as well as prior distributions. The posterior distribution of the kinetic 
parameters is estimated using Markov chain Monte Carlo (MCMC) methods. Hypothesis testing at the 
study level for an overall treatment effect is straightforward and the patient- and voxel-level parame- 
ters capture random effects that provide additional information at various levels of resolution to allow a 
thorough evaluation of the clinical trial. The proposed method is validated with a breast cancer study, 
where the subjects were imaged before and after two cycles of chemotherapy, demonstrating the clinical 
potential of this method to longitudinal oncology studies. 
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1 Introduction 



Assessing the efficacy of cancer treatments using in vivo imaging is shifting from quaUtative techniques 
to quantitative imaging methods that characterize biologically relevant properties of tumor tissue. The 
use of model- free or heuristic measures, such as the initial area under the Gadolinium curve (lAUGC), or 
fully quantitative measures, such as the kinetic parameters from a compartmental model, are relatively 
well understood in the analysis of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) 
([TJ[2j[3l). Analysis of an oncology imaging trial is usually achieved by applying statistical summaries, such 
as the mean or median, to the parameters of interest derived from tissue regions of interest (ROIs). That 
is, enhancing (tumor) voxels are identified from the DCE-MRI data for each scan across all subjects and 
those voxels are represented by a single parameter; e.g., K*'"'^'^^ from quantitative analysis and IAUGC90 
from a heuristic analysis. Hypothesis testing, either parametric or non-parametric, may then be applied 
to the derived statistics in order to assess the effects of treatment. 

Applying statistical summaries to the kinetic parameter maps from DCE-MRI however discards a 
substantial amount of information contained in the contrast agent concentration time curves (CTCs) at 
each voxel, essentially abstracting thousands of observations in space and time to a single number per 
scan per subject. We believe that there is a wealth of potential information by retaining the collection 
of CTCs across all subjects and scans, acknowledging the fact that not all CTCs are the same and not 
all patients are the same. 

This paper proposes a Bayesian hierarchical model to analyze all tumor CTCs across all patients and 
scans in a given study simultaneously based on the concept of a mixed-effects model. Mixed-effects models 
are well established in the statistical community and have found widespread applications in, for example, 
agriculture, economics, geophysics and the analysis of clinical trials (jUE]). Mixed-effects models extend 
the concept of traditional linear or non-linear models by combining both fixed effects and random effects 
in the same model. More generally, mixed-effects models are most often used to describe relationships 
between the measured response and explanatory variables in data that are grouped according to one or 
more factors. Fixed effects denote parameters that are associated with an entire population and random 
effects denote parameters which are associated with random samples from a population. For example, the 
drug or radiation therapy given in a trial is a fixed effect because there is no randomness associated with 
it, whereas patients are inherently random because they are sampled from the population of all patients 
with a given disease. By acknowledging the fact that some parameters are associated with random 
samples from a population, we are able to generalize the results from a mixed-effects model beyond the 
collection of subjects used in the model fitting. 

Bayesian methods are used in the construction and estimation of the generalized additive model ([6l [7]) 
associated with each kinetic parameter in the model of the CTCs. Similarly to mixed-effects models in a 
maximum likelihood setting the variances associated with the fixed effects are chosen to be constant, but 
the variance terms associated with the random effects have prior distributions. This leads to a shrinkage 
estimation of the random effects so that the random effects are pushed towards zero ([8]). The fixed effect 
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in the model explain as much variance as possible, whereas the random effects capture variability that 
cannot be explained by the fixed effects. 

Formulation of a Bayesian hierarchical model is typically achieved in three stages: the data model, 
the parameter model and the prior parameters (jHl llOp . The data model reflects our knowledge of the 
CTCs at the voxel level using the class of compartmental models ([TTI [T^ [TS]) with a standard arterial 
input function (AIF) taken from the literature ([T^ [T51 fTB]) . The process model describes how model 
parameters are generated from underlying processes. At this step we decompose the kinetic parameters 
into treatment, patient and voxel effects. In Bayesian theory all parameters are regarded as random 
variables with pre-specified (prior) distributions. This includes the parameters for the fixed as well as 
the random effects in the model. Conjugate distributions with appropriately selected values are used 
as priors on all parameters in the model in order to limit their range during the estimation procedure. 
These choices also allow us to implement efficient sampling methods, wherever possible, to reduce the 
computational burden. 

As an illustration of the parameter model let -0 be a kinetic parameter of interest and suppose 
the imaging study acquires a dynamic MRI measurement sequence at two time points for each subject - 
before and after treatment. The generalized additive model uses the natural logarithm as the link function 
between the signal and effects that can be identified from the study design. For example, assume the 
imaging study included two time points (pre- and post-treatment) so the parameter ip may be decomposed 
via 

In 7/; = baseline -I- treatment -I- patient -I- (patient ■ treatment) -I- voxel. [ 1 ] 

With model fitting considering all effects related to the kinetic parameter -0, a curve that fits the observed 
data at a particular voxel (associated with a specific scanning session and patient) can be derived. This 
is illustrated in Figure [5] where pre- and post-treatment voxels have been selected from three subjects. 
The solid line in each plot is the fitted curve to the CTC at a single voxel including all effects in the 
model (Eq. [J). 

One can construct different versions of that involve a subset of terms in Eq. [Tj. For example, defining 
ipb = exp (baseline) produces the estimated baseline value of the kinetic parameter across the entire study, 
while ignoring patient- and voxel-specific information and ipt = exp(baseline -I- treatment) produces the 
estimated treatment value of the kinetic parameter across the entire study when patient- and voxel-specific 
information are ignored. In Figure the dotted line in each plot is the fitted curve corresponding to the 
posterior median CTC for the whole study pre-treatment (top row) and post-treatment (bottom row). 

Relative changes between baseline and treatment are also available by looking at the individual com- 
ponents in Eq. [Tj; e.g., ipf — exp(treatment) produces the percentage change due to treatment relative 
to the baseline value ipb- Given the mathematical properties of exponential functions, we know that 
"06 ■ Ipt' = Ipt, thus relating the relative changes attributed to specific effects in the generalized additive 
model to absolute values of the kinetic parameter. Similar manipulations may be performed to investi- 
gate patient- or voxel-specific effects and their relative changes from baseline. This figure is described in 
greater detail in the results section. 
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The goal of model construction and estimation presented here is to quantify the effect of drug treat- 
ment on disease - in this case breast cancer - through quantitative summaries of tumor microvasculature 
using DCE-MRI. The model framework we have adopted provides a unified treatment of imaging in- 
formation at the study level through simultaneous estimation of parameters at the voxel, patient and 
treatment level, allowing a thorough interrogation of the results. 



2 Bayesian Hierarchical Model 

Bayesian methods rely on the specification of prior distributions p{9) that express our information about 
the unknown parameters 6 before any measurements are obtained; i.e., our model assumptions. To assess 
the model parameters after observing the data, the posterior distribution p(0 | Y) can be computed, where 
6 is the vector of all unknown parameters and Y is the vector of observations. The posterior distribution 
of the parameter vector 6 is obtained by applying Bayes' theorem 

' ^ jp{e*) £{Y\9*)de*' 

where £(Y \ 6) denotes the likelihood function of Y and p{6) the product of all a priori probability 
distribution functions. One can think of the posterior as an update to the prior distribution, our beliefs, 
on 6 after measuring a process - producing a mixture of previous knowledge and experimental data. 
Bayesian methods are inherently iterative, since the posterior distribution can become our new prior 
distribution and, be combined with new measurements of the data generating process at a later date, to 
produce an updated posterior distribution. 

The following sections introduce the key components in the Bayesian hierarchical model: the data 
model, the parameter model and the prior parameters. Each stage of the model development has been 
tailored to the analysis of a longitudinal cancer treatment study with two time points. Figure [T] provides a 
schematic overview of the proposed Bayesian Hierarchical Model (BHM). The three model stages are the 
rows and the columns represent the "resolution" of the parameters. For example, K*'"'^'^^ is decomposed 
into global (study- wide), subject and voxel effects through the BHM where as Vp is simply estimated for 
each voxel without further decomposition. The measurement error term is independent of the specific 
parameter model and involves both prior and hyperprior distributions. A standard compartmental model 
is used to describe the concentration time curves observed at each voxel. A generalized additive model 
is proposed to decompose the kinetic parameters into factors that are relevant to the design of the 
longitudinal study. Finally, the prior distributions, including necessary hyperparameters, are specified on 
all factors of the parameter model. These prior distributions are flat in most cases, reflecting a lack of 
knowledge concerning the parameter, but also incorporate biological knowledge, such as a transfer rate 
must be non-negative, or statistical knowledge, for example a variance must be non-negative. 
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2.1 Data Model 

A hierarchical Bayesian framework is used to model the contrast agent concentration time curve (CTC) 
of all voxels (|T7|) . Let Y — [Y{ti),Y{t2), ■ ■ ■ ,Y{tT)]^ denote the CTC associated with a single voxel 
observed at T time points determined by the image acquisition protocol. The CTC is assumed to follow 
a standard compartment model (jl6p 

Ct{t) = VpCpit) + Cp{t) ® if'--^"^ exp(-i /cep), [3] 

where denotes the convolution operator, K^'^^'^^ represents the transfer rate from plasma to extracellular 
extravascular space (EES) per minute, fccp the rate constant between EES and blood plasma per minute 
and Vp the vascular space fraction. The choice of model for the CTC depends on the scientific goals of the 
study. Replacing Eq. [3] with a more or less complicated model is straightforward in this model-building 
framework. The observed vector Y may therefore be thought of as noisy observations of the true contrast 
agent concentration Ct{t) given by a draw from a multivariate Normal distribution 

Y~NT(Ct,a2/T), [4] 

where the notation Y ~ N(/i, cr^) means that the random variable Y is drawn from the Normal distribution 
with parameters /i and a^. 

We assume a common arterial input function (AIF), taken from the literature for all patients in the 
study, and we follow the work of Tofts and Kcrmode (:12) by using a bi-exponcntial function 

Cp{t) — D[ai exp(mii) + 02 exp(m2i)], [5] 

where ai — 24.0 kg/1, 02 = 6.20 kg/1, mi = 3.00 min^^ and TO2 — 0.016 min~^ are inspired by the work 
of Fritz-Hansen et al. 

A Bayesian implementation of the compartmental model above was proposed in Schmid et al. (jlSp . 
Since the Bayesian model framework does not depend on any optimization procedure, it will produce 
valid parameter estimates when estimation via nonlinear regression fails to converge. Samples from the 
posterior distribution are built up during the model fitting procedure for each parameter. Hence, the 
posterior distribution may be used to obtain additional information on the accuracy and precision of the 
estimates. For example, the standard error of the posterior is the estimation error. Statistics of interest 
may be derived from the posterior distribution (e.g., mean, median, quantiles, etc. ) so that not only 
point estimates but also confidence intervals are readily available. 

2.2 Parameter Model 

The pharmacokinetic (PK) parameters from the data model are estimated at every tumor voxel across 
all subjects and scans. We assume a priori that the distribution of the random variables K^'^^'^^ and fcop 
in the tumor are patient-specific and are changed by treatment in a similar way. Therefore a generalized 
additive model is used where the log-transformed kinetic parameters \u{K^'^'^^^^) and ln(/cep) are expressed 
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as a linear combination of fixed- and random-effects associated with identifiable factors in the study. 
In addition to mathematical convenience, the log transform is also appealing in this context because 
individual terms in the additive model may be interpreted as percentage change from baseline. We 
assume that the distribution of the vascular fraction Vp will not be changed by the treatment, however 
single Vp values will be changed. Let i ~ 1, . . . , / denote the scans acquired and let j = 1, . . . , J denote 
the patients, so that denotes the number of tumor voxels for patient j at scan i, measured at T 
time points. The transfer rate constants in Eq. [3] are assumed to be non-negative and estimated in 
log-transformed space. That is, let tpi = ln(ii:'™""), V2 = In(fcop) and ip = [ln(X*'^""), In(fcop)]"^. 

The factor of interest when measuring a change in the kinetic parameters is the treatment effect, or 
the difference between scan i = 1 and scan i — 2 when only pre- and post-treatment images are acquired. 
We acknowledge the fact that substantial variability exists across patients in the study and between the 
voxels in each region of interest (ROI) that describes the enhancing region in the acquisition. Hence, the 
model for ln{K^'''^'^^) is given by 



Ipijkl = [1 Xt] 







+ [1 X,] 


7ji 








. ^1 . 







+ eijki, ioT all i,j,k 



\6] 



7] 



where 

1 scan 1 = 2; 
otherwise. 

The parameter ai is the value of ln(ii'*™'') associated with the baseline scan and /3i is the treatment 
effect (since it is only associated with the post-treatment acquisition). These parameters are regarded 
as fixed effects (the global column of Figure [T|) , and thus do not vary between patients in the study. 
In the Bayesian framework, a marginal posterior distribution will be available for each parameter. The 
parameter 7^1 is the effect of patient j on ln(iir*''™'') and Sji is the interaction between patient j and 
treatment. These parameters are random effects since each patient is assumed to be drawn from a 
larger population of patients suffering from this condition (the subject column of Figure [T]). Finally, the 



parameter eijki is the random effect of voxel k in scan i of patient j on ln(_R'' 



The voxel effect 



acknowledges the fact that each voxel in the tumor volume is drawn from a distribution that describes 
the ideal tumor voxel (the voxel column of Figure [1]) . The combination of fixed and random effects in a 
single model is commonly referred to as a mixed-effects model 

Using matrix notation, we can combine the generalized additive model across both kinetic parameters. 



jj^^trans Infccp, such that 



X,; 



1 a;, 
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ai 




7ji 
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The scan-specific covariates in the model are captured in , the fixed effects are in (pi , the patient-specific 
random effects are in 6ji and the voxel-specific random effects are in eijki ■ The model formulation in 
Eq. [8] can be adapted in order to incorporate a greater number of scans in a longitudinal study. 

2.3 Prior Models 

In the Bayesian framework prior information with unknown variance is used to model the random effects. 
We use vector notation to denote the patient-specific random effects such that 



7 = 



711 

721 

7J1 

712 



and S = 



Sii 
Sji 

5l2 



[10] 



7J2 J L '^J2 

where we have dropped the kinetic parameter subscript to simplify the notation. We draw from mul- 
tivariate Gaussian distributions to characterize the prior distribution of the unknown variances for the 
patient-specific random effects, i.e., 



7 ^ N2.7(0,diag(T^)), 
S ^ N2j(0,diag(T^)), 



[11] 
[121 



where and are vectors of the same length and indexed as 7 and d, respectively. The voxel-specific 
random-effect vectors are given unique prior distributions by scan, patient and parameter, so that each 
vector is given by ey; = [eiju, £^2;, ■ • • , ^ijuijiV ^^'^ it is drawn from a multivariate Gaussian distribution 
via 



[131 



where is the number of voxels in the region of interest of scan i of patient j, and t^.^^ j is the unknown 
variance associated with scan i, patient j and parameter I. Since the variances are unknown parameters, 
they must have their own prior distributions which are given by 

2 iid 



7 

2 iicl 



IG(1,1) 
IG(1,1) 



IG(l,10-5) 



[14] 
[15] 
[16] 



where IG(a, 6) denotes the Inverse Gamma distribution (19), allowing only non-negative values. The 
inverse Gamma distribution is a conjugate prior for the Normal distribution. For the fixed effects we use 
fiat priors; i.e., the prior distribution does not contain any relevant information, such that 



p{ai) = p{(3i) — constant for ^ = 1, 2. 



[17] 



8 



The prior distributions on the coefficients in the generahzed additive model are chosen so that as much 
variance in the data is explained by the fixed effects a and /3 - as no prior information is used for those 
parameters. Variability which cannot be explained by the fixed effects will be covered by the random 
effects 7 and 6. For these parameters an additional prior distribution (hyperprior) on the variance of 
the parameters is defined, which leads to a shrinkage of those effects, that is the parameters are pushed 
towards zero and therefore do not cover variance explained by the fixed effects. Any additional variance 
is explained by the voxel effects. 

For the vascular space fraction we impose a relatively flat prior 

Vp;ijk B(l, 19), for all k, [18] 

where B(a, 6) denotes the Beta distribution pp]). so that the a priori expected value of Vp is 0.05. 
The Bayesian hierarchical model is complete by specifying a prior distribution for the variance of the 
observational error in Eq. [4], with one variance parameter per scan per patient, 

afj ~ IG(1, IQ-^) for all i,j. [19] 

3 Materials and Methods 
3.1 Data acquisition 

The first twelve patients from a previously reported breast cancer study are included in the analysis 
([2l ] [T8|) . Data were provided by the Paul Strickland Scanner Centre (PSSC) at Mount Vernon Hospital, 
Northwood, UK. Each patient underwent a DCE-MRI study before and after two cycles of chemotherapy 
(5-fiuorouracil, epirubicin and cyclophosphamide). Six of these patients were identified as pathological 
responders after receiving six cycles of chemotherapy, the others were non-responders. 

For the calculation of Ti values, we used a two-point measurement with calibration curves as described 
in [^5]) . The Ti values are computed as ratio of a Ti-weighted fast low-angle shot (FLASH) image and 
a proton density weighted (PDw) FLASH image. The imaging parameters of the Ti-weighted FLASH 
images were TR =11 ms, TE ==4.7 ms, a — 35°, the parameters of the proton density- weighted image 
were TR = 350 ms, TE = 4.7 ms, a ^ 6°. Field of view was the same for all scans, 260 x 260 x 8 mm 
per slice, so voxel dimensions were 1.016 x 1.016 x 8 mm. A scan consists of three sequential slices of 
256 X 256 voxels and one slice placed in the contra lateral breast as control, which we do not use for 
our analysis. A total of 40 to 50 acquisitions were acquired, with one acquisition each 11.9 seconds. A 
dose of D = 0.1 mmol per kg body weight of Gd-DTPA was injected after the fourth scan using a power 
injector with 4 ml/s with a 20 ml saline flush also at 4 ml/s. The flrst four scans, before contrast, were 
used to compute Tig as the average of the Ti values of these images. Data from this study were acquired 
in accordance with the recommendation given by (24). Informed consent was obtained from all patients. 

Regions of interest (ROIs) were drawn manually by an expert radiologist on a scan-by-scan basis using 
anatomical images and subtraction images from the dynamic data to define tumor voxels in pre and post 
treatment scans. 
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3.2 Parameter Estimation via MCMC 

The joint posterior distribution of all parameters was assessed using Markov chain Monte Carlo (MCMC) 
(PS)) . After a initial burn- in phase of 10,000 iterations, another 100,000 iterations were computed. To 
ensure independent samples only each 100th sample was used, giving us a total of 1000 samples to 
describe the posterior distribution. The global parameters and patient-specific Oj were drawn en bloc 
in Gaussian Gibbs steps (|26p. and hyperparameters were drawn in independent Gamma Gibbs steps; 
technical details can be found in Appendix lA.ll Metropolis-Hastings steps with random walk proposals 
were necessary for the voxel-specific random effects and vascular space fraction. The algorithm was 
tuned to an acceptance rate of 30-50% (27). Summary statistics were computed from the samples of 
the posterior distribution to provide point estimates of the parameters. Empirical standard errors, along 
with sample quantiles, were used to characterize the precision of the parameter estimates. 

4 Results 

All parameter estimates are derived from the posterior distribution using Bayes theorem. Hence, a 
sampling distribution for each parameter value has been built up from which we can produce a point 
estimate via the median of the sample and also credible intervals (Bayesian confidence intervals) by using 
the quantiles from their sampling distributions. 

How the individual parameters from the generalized additive model coalesce to fit the observed con- 
trast agent concentration time curve is illustrated, at the voxel level, in Figure [2l The observed CTCs 
for two voxels from three subjects, one voxel at baseline and one voxel after treatment, are plotted along 
with three fitted curves. The best estimate from the Bayesian hierarchical model at a specific voxel is 
provided by the solid lines in each plot. That is, all parameters from the generalized additive model 
(Eq. [5]) are used in the parameter model in order to fit the data model. These curves are very similar 
to, but not exactly the same as, model fits from the standard non-linear regression method used in the 
quantitative analysis of DCE-MRI data (18). Removing the voxel-specific term from the model produces 
a fitted curve that is associated with patient and treatment effects, but not the specific voxel, and are 
plotted as dashed lines in Figure [2l Given the presence of inter- voxel heterogeneity in the tumor ROI, 
the dashed lines may or may not fit the observed data at a given voxel very well but they do represent 
the best (in the sense of a posterior median) fit to all voxels in the tumor ROI for a given patient at a 
single scan time point. Going back one more level in the generalized additive model and removing the 
patient effect leaves a fitted curve associated with the baseline and post-treatment scans (i.e., two curves 
that summarize the overall treatment effect) given by the dotted lines. The top row of Figure [2] contains 
voxels from three subjects before treatment so the dotted lines are identical and represent the best (in the 
sense of a posterior median) fit to all pre-treatment voxels across all subjects. The bottom row contains 
voxels from the same subjects after treatment and the dotted line is the best fit to all post-treatment 
voxels. 

Figure [3] shows the posterior distributions of pre-treatment (baseline) i^'ti'^n^' and post-treatment 
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^trans jg^ posterior samples were transformed via exp(Q;i) and exp(ai + respectively. For 

ease of comparison between the two posterior distributions a smoothed version of each histogram, known 
as a kernel density estimate, is displayed (28). The posterior median of K^''^'^^ at baseline is 0.205, and the 
posterior median of K^''^'^^ after treatment is 0.156. Credible intervals for K^'^^'^^, that cover 95% of the 
posterior distribution, are [0.186,0.234] at baseline and [0.121,0.198] after treatment. A credible interval 
is a posterior probability interval. That is, the true value of i^T*''*"*' lies in the interval [0.186, 0.234] with 
posterior probability 0.95 at baseline and in [0.121,0.198] with posterior probability 0.95 after treatment. 

The density estimates in Figure [3] are unimodal and indicate an overall decrease in K^'^'^'^^ after 
treatment. In order to test for a treatment effect on K^"^'^^^ specifically a reduction in K^''^'^^ in the 
second acquisition compared to the first, we construct the hypothesis 

Ho: (3i>0 versus Hi : (3i < 0, [20] 

using the treatment effect from the parameter model (Eq. [8]) and calculate the posterior probability of 
Pi exceeding zero. From the results of the MCMC simulation, the null hypothesis (Eq. [20]) is rejected 
with p = 0.001. 

When introducing the generalized additive model previously the fact that the parameter K^'^^'^^ and the 
covariates are linked through a logarithmic transform leads to the interpretation of individual covariates in 
the parameter model as percentage changes from baseline instead of absolute changes. For the treatment 
effect this translates into a 100% • [0.7659 — 1| = 23.3% median reduction in K^'^'^'^^ from baseline, where 
the sign determines whether the change is associated with an increase or decrease. 

Figure m shows the patient-specific posterior distributions for pre-treatment K^'^^'^^, given by exp(ai + 
7ji) for j = 1, . . . , 12, and post-treatment X*'''*"'^, given by exp(ai + f3i -f 7ji +Sji) for j = 1, . . . , 12. The 
clinical responders are grouped in the first two columns of Figure H] and the clinical non-responders are 
in the third and fourth columns. The same range for x-axis [0, 0.45] was used in all plots of /C''''"^ for 
comparison. In general the decrease in K^'^^^^'^ observed in the clinical responders is greater than the clinical 
non-responders, but this is not absolute. For example, patient 12 shows only a small decrease in K^''^'^^ 
post-treatment and patient 6 shows an increase in K^'^'^'^^ after treatment, but both are clinical responders 
after additional chemotherapy. The interpretation of the treatment effect as a percentage change from 
baseline helps to quantify the results in Figure H) The median percentage change in K^''^'^^ for subject j 
is obtained via 100% • |exp(/3i + 6ji) — 1\, where the sign determines whether an increase or decrease 
occurred. For example, patient 1 (pathological responder) experienced a 100% • [0.7684 — 1| = 23.2% 
median reduction in K*''"^'^^ which is very similar to the overall treatment effect. This is definitely not the 
norm as patient 9 experienced a 100% • [0.4285 — 1| = 57.2% median reduction in i<f*''™^ and patient 6 
experienced a 100% ■ [1.0817 — 1| = 8.17% median increase in A'*'''"'', both were pathological responders. 

Figure [5] shows the voxel-specific median posterior for pre- and post-treatment K^'^^'^^. The clinical 
responders are grouped in the first two columns and the clinical non-responders are in the third and 
fourth columns (identical to Fig. |4|). The range for the x-axis was restricted to [0,1] in all plots for 
comparison. Given the number of samples from the posterior distribution across all voxels, the median 
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value of exp(ai + jji + eijki) for j — 1, ... ,12; k — 1, ... , and exp(ai + f3i + 7ji + Sji + €2jki) for 
j = 1, . . . , 12; k = 1, . . . , n2j across the 1000 samples, for each voxel fc, was computed to summarize the 
voxel effect. The resulting histograms for the voxel effect have been summarized by a kernel density 
estimate. Most voxel-level distributions of median K^'^^'^^ show a substantial change in shape after 
treatment, although this is more apparent in the responders compared to the non-rcspondcrs. It is 
interesting to note the extent of changes in the shape of these distributions between the different subjects. 
For example, patient 11 is characterized by a tumor with two distinct modes in estimated iC'''^"^ at 
baseline and a single mode after treatment. Even more interesting is the fact that the post-treatment 
distribution of K*^'^'^'^^ is in between the two modes at baseline. The distributions of median K^'^^'^^ for 
patient 12 show the reverse effect, albeit much more subtle than patient 11, where the post-treatment 
distribution of median X*''*"*' appears to be bimodal but still spans a similar range of values. 

5 Discussions and conclusions 

Information is obtained at multiple levels during an imaging study in the clinical trial setting. The 
main scientific question of interest is usually, was there a treatment effect? This key hypothesis test 
drives study design by influencing critical experimental design parameters such as power and sample 
size. However, information at other levels, such as the patient or voxel level, can provide insight into 
much more subtle features concerning patients, tumors and the treatment effect. Patient variability with 
application to predicting clinical response and tumor heterogeneity, as measured by voxel-wise properties 
of the pharmacokinetic model, are just two examples of so-called secondary endpoints. 

The Bayesian hierarchical model presented here was developed to test the hypothesis of a treatment 
effect for an imaging study while acknowledging known sources of uncertainty; e.g., patients and voxels. 
This is similar to the approach taken in standard analysis methods for clinical trials where fixed and 
random effects are identified in the model. The specification of fixed and random effects allows the 
results from the study to be applicable beyond the specific patient population recruited for this specific 
study. 

A standard analysis was performed on the ROIs and the median K^^'^'^'^ values have been summarized 
in Tabled] A non-parametric test (a one-sided Wilcoxon signed rank test) was performed to test that 
the difference between the median values was greater than zero; i.e., the treatment did not reduce K^''^'^^ 
across all subjects. The null hypothesis was rejected at a borderline significance level {p = 0.055). Given 
the small sample size, A^i = 6 responders and N2 = 6 non-rcspondcrs, this is an impressive result and 
there is obviously a reasonable difference in i^T*''^'^'' between the two groups. 

Figure [6] shows the kernel density estimates of JC*''*"*' for each ROI, before and after treatment, using 
a voxel-wise non-linear regression analysis. That is, the compartmental model in Eq. [3] was fit to each 
voxel independently using the Levenberg-Marquardt optimization procedure. The empirical distributions 
observed for each patient are extremely similar to those obtained in the BHM. This is to be expected 
given the relatively flat priors that were imposed on the kinetic parameters (|18). 
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While the voxel-wise results from the Bayesian and regression methods are very similar, and thus 
provide a check on the consistency of the Bayesian model fitting procedure, the advantages of the Bayesian 
hierarchical model are clear through the coefhcients from the generalized additive model (Eq. [8]). The 
regression analysis can only summarize the study through Table [1] but the BHM allows one to isolate 
and interrogate specific effects, at the study or patient or voxel level, through the generalized additive 
model. Examples of such interrogations have been presented here in Figures [3] and 31 but the possibilities 
for such model summaries are only limited by the construction of the parameter model. 

Bayesian models rely on a priori beliefs about the model and parameters, expressed as prior distribu- 
tions. In general, a flat prior provides similar information to a maximum likelihood approach, and hence 
similar results. However, in the Bayesian hierarchical model proposed here the choice of prior distribution 
is critical in specifying the model. We used flat priors on the baseline a and on the treatment effect /3, 
and thus the approach is similar to a non-Bayesian or frequcntist approach. For the patient specific 
effects 7 and 5 we used Gaussian priors with unknown variances; this is also known as shrinkage prior, as 
it shifts the parameters towards zero. Hence the patient-specific effects only pick up the deviation from 
baseline and treatment effect. The voxel effect was also given a shrinkage prior with a more informative 
hyperprior distribution on the variance, hence it only picks up variability after modelling the baseline, 
treatment and patient-specific effects. 

In this paper a generalized additive model was constructed for the kinetic parameters {^K^''^^^ and 
fcep) in a compartmental model. This model incorporated two scanning sessions, and all subjects, to 
asses the effect of treatment. The modeling framework is easily extended to handle additional covariates 
or scanning sessions. For example, a dose-ranging study design could be incorporated into the additive 
model where the treatment effect can be expressed as a function of the dose. Additional scans over time 
would enable the assessment of temporal dependence on treatment and provide information about the 
reliability of the data by potentially reducing the amount of uncertainty in the parameter estimates. 

Another possible extension of this model would be to include the spatial information of adjacent voxels. 
In the current implementation of the Bayesian hierarchical model all voxels from one region of interest 
(tumor) were treated as spatially independent. Since voxel borders are arbitrary and do not represent 
physiological boundaries between different tissue types, it is likely that neighboring voxels share similar 
perfusion characteristics. This fact has been taken advantage of in the context of Bayesian modeling of 
individual scans from a DCE-MRI study (18). The inclusion of a neighborhood structure in the modeling 
process would reduce the uncertainty in estimation and provide more reliable estimates of the kinetic 
parameters. 
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A Appendix 



A.l Full conditional distributions 

In each iteration of the MCMC (Markov chain Monte Carlo) algorithm, a random sample of the marginal 
posterior distribution for all parameters is drawn. This is performed by drawing from the conditional 
posterior distribution of one or more parameters given all other parameters and the data. Hence, the 
full conditional distributions must be computed. The full conditional is denoted by | ■, where 9 is the 
parameter and • denotes all other parameters and the data. If the full conditional takes the from of a 
standard distribution, one can sample directly form this distribution; this is known as the Gibbs sampler 
([25|). If the full conditional is not a standard distribution, then a Metropolis-Hastings sampler must be 
constructed. 

In the proposed Bayesian hierarchical model all full conditionals are from standard distributions due 
to the use of conjugate prior distributions, except for the voxel effect and Vp. Let ~ {ai, Pij'fi, Si) 
denote the vector of length P — /( J+ 1) associated with all parameters in the generalized additive model, 
except the voxel effect, for a specific kinetic parameter. The full conditional of is a multivariate Normal 
distribution given by 

|;|. ^ Np(V-im,V-i), 
m = [mi,...,mp]^, 

"^P X! X! I '^"'"'1 X! ^iop^vki I for p = 1, . . . , P, 

i=i j=i \ k=i I 

V = W'^AW + diag(0,0, T^;i/,...,T^;ji,T5;i;,...,r5;ji), 

where W is a I{J + 1) x P matrix indicating which covariate should be included in the parameter model 
(Eq. 0) and A is a diagonal matrix with elements riijT^-ij. The vector is drawn in one block from a 
multivariate Normal distribution with an efficient block-sampling algorithm (j29p. 

The full conditional distribution of the voxel effect Eij^i is a non-standard distribution. For computa- 
tional reasons it is more convenient to sample from ^ijki rather than from e^fej. where the full conditional 
distribution of tpiju is given by 



^3(V'^ife^ I ■) oc exp I -^T,,ijk^fjki - 1^ {^loki - %ki) 



jk 

Note, Yij}~i is the estimated contrast agent concentration curve given by the estimated model parameters 
in ipijki- Draws from this distribution are obtained using a Metropolis-Hastings step. 

The full conditionals of all variance parameters are inverse Gamma distributions, which are given by 

r,^|. IG (1.5,1 + 710, [22] 

r||. IG(1.5,1 + <5|0, [23] 



rt\- - IG\1 + \j2j2n.,. 10- 

1=1 3 = 1 








- ^HoM ) ■ [24] 
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Hence, the variance parameters can be drawn independently. 
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Figure Captions 

1. Schematic overview of the Bayesian hierarchical model for the observed contrast agent concentration 
time curves. 

2. Contrast concentration time curves (CTCs) for pre- and post-treatment scans in three different 
subjects. Solid lines indicate the voxel-specific fit to the CTC, dashed lines the combined patient- 
and treatment-specific CTCs, and dotted lines the global pre- and post-treatment median CTCs 
for the entire study. 

3. Samples from the marginal posterior distributions of if*''™^ at the study level. At pre-treatment 
^trans jg gjygjj exp(ai) and at post-treatment K*'^'^'^^ is given by exp(ai + f3i). 

4. Samples from the marginal posterior distributions of /i'*''™^ at the patient level. At pre-treatment 
^trans jg giyen by exp(ai -|- 7^1) for patient j and at post-treatment K^'''^^^ is given by exp(ai -|- 
01 + 7ji + Sji) for patient j. 

5. Smoothed histograms summarizing the values of the posterior median K*''^^'^^ at the voxel level. 
At pre-treatment K*"^^^^ is given by exp(a'i + 7ji + eijfei) for scan 1, patient j and voxel k. At 
post-treatment K^'^'^'^^ is given by exp(Q!i + f3i + jij + Sij + ei2jk) for scan 2, patient j and voxel k. 
The a;-axis has been restricted to [0, 1] for visualization. 

6. Smoothed histograms summarizing the values of K^'''^^^ from voxel-wise non-linear regression anal- 
ysis. The a;-axis has been restricted to [0, 1] for visualization. 
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Table 1: Median K^"^^^^ values from the standard analysis (R = responder, NR = non-responder). 
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